Extracting WordNet-like Top Concepts from Explanatory Dictionaries*
نویسندگان
چکیده
Correct interpretation of the text frequently requires knowledge of semantic categories of nouns, especially in languages with free word order. For example, in Spanish the phrases pintó un cuadro un pintor (lit. painted a picture a painter) and pintó un pintor un cuadro (lit. painted a painter a picture) mean the same: ‘a painter painted a picture’; with the only way to tell the subject from the object being by knowing that pintor ‘painter’ is causal agent cuadro is a thing. We present a method for extracting semantic information of this kind from existing machine-readable human-oriented explanatory dictionaries. First, we extract from the dictionary an is-a hierarchy and manually mark the categories of a few top-level concepts. Then, for a given word, we follow the hierarchy upward until finding a concept whose semantic category is known. Application of this procedure to two different human-oriented Spanish dictionaries gives additional information as compared with using solely Spanish EuroWordNet. In addition, we show the results of an experiment conducted to evaluate the similarity of word classification with this method.
منابع مشابه
Extracting Lexico-conceptual Knowledge for Developing Persian WordNet
Semantic lexicons and lexical ontologies are some major resources in natural language processing. Developing such resources are time consuming tasks for which some automatic methods are proposed. This paper describes some methods used in semi-automatic development of FarsNet; a lexical ontology for the Persian language. FarsNet includes the Persian WordNet with more than 10000 synsets of nouns,...
متن کاملWord Association Thesaurus As a Resource for Building WordNet
The goal of the present paper is to report on the on-going research for applying psycholinguistic resources to building a WordNet-like lexicon of the Russian language. We are to survey different kinds of the linguistic data that can be extracted from a Word Association Thesaurus, a resource representing the results of a largescaled free association test. In addition, we will give a comparison o...
متن کاملProcessing and extracting data from an open dictionary of the Portuguese language
Synonyms dictionaries are useful resources for natural language processing. Unfortunately their availability in digital format is limited, as publishing companies do not release their dictionaries in open digital formats. Dicionário-Aberto (Simões and Farinha, 2010) is an open and free digital synonyms dictionary for the Portuguese language. It is under public domain and in textual digital form...
متن کاملAdjectives in RussNet
This paper deals with the problem of structuring adjectives in a wordnet. We will present several methods of dealing with this problem based on the usage of different language resources: frequency lists, text corpora, word association norms, and explanatory dictionaries. The work has been developed within the framework of the RussNet project aiming at building a wordnet for Russian. Three types...
متن کاملDevelopment of the Hungarian WordNet Ontology and its Application to Information Extraction
This paper presents an outline of the construction process of the Hungarian WordNet Ontology, and the description of an information extraction application utilizing the ontology. and MorphoLogic) in a 3-year project funded by the European Union ECOP program (GVOP-AKF-2004-3.1.1.) The Princeton WordNet (WN) linguistic ontology ([1]) has become a standard and an invaluable semantic resource withi...
متن کامل